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Bi-orthogonally  Accessed  3-D  Two-photon  Memory 
for  Relational  Database  Operations 


Brita  Olson  and  Philippe  J.  Marchand 

Very  large  relational  database  machines  require  increased  memory  bandwidth  and 
parallel  processing  to  complete  queries  in  a  reasonable  amount  of  time.  Currently  these 
machines  achieve  their  capacity  and  throughput  by  employing  an  array  of  serial  access  disks  and 
processors  As  these  machines  grow,  however,  so  will  their  need  for  high  capacity^andwidth 
storage  systems.  3D  parallel  access  storage  devices  have  the  potential  of  being  able  to  achieve 
enormous  capacity  (1  Tbit/cm^)^  as  well  as  throughput  (1  Tbit/sec)  and  seem  well  suited  for  this 
application.  The  bit-oriented  3D  two-photon  memory  has  a  further  feature  of  being  able  to  be 
accessed  in  a  number  of  directions.  This  report  examines  the  potential  performance  of  a  bi- 
orthogonally  accessed  3D  two-photon  memory  for  relational  database  operations  with  a  data 
organization  scheme  particular  to  this  approach.  The  favorable  results  of  this  study  lead  to 
another  study  in  which  the  feasibility  of  building  an  optoelectronic  database  data  filter  to 
interface  with  the  memory  was  considered. 

1.  BACKGROUND 

As  databases  grow  in  size,  they  require  increased  memory  bandwidth  and  parallel 
processing  to  complete  queries  in  a  reasonable  amount  of  time.  Currently  very  large  database 
machines  achieve  their  capacity  and  throughput  by  employing  an  array  of  serial  access  disks  and 
processors  The  bandwidth  of  these  storage  systems  can  be  expanded,  if  the  memories  in  the 
array  are  accessed  in  parallel.  Mitkas  and  Berra  have  concluded  that  a  relational  database 
machine  utilizing  a  single  parallel  access  optical  disk  and  a  parallel  optoelectronic  processor  has 
the  potential  of  outperforming  a  relational  database  system  based  on  an  array  of  30  serial  access 
disks  each  with  its  own  processor  However,  parallel  access  optical  disks  will  not  achieve 
capacities  much  greater  than  those  of  serial  access  optical  disks. 

3D  parallel  access  optical  memories  are  capable  of  achieving  tremendous  capacity  in 
addition  to  throughput  by  storing  data  throughout  the  volume  of  the  memory  device  as  opposed 
to  on  the  surface.  Recent  advances  in  both  volume  holographic  ^  and  3D  two-photon®  page 
access  optical  memories  show  great  promise  for  these  high  capacity /band  width  storage  devices. 
Mitkas  and  Irakliotis  ’  have  suggested  their  use  in  relational  database  systems. 

In  relational  database  machines  the  performance  of  operations  depends  not  only  on  the 
memory  bandwidth,  but  also  on  how  the  data  is  positioned  and  consequently  accessed  from  the 
memory.  The  optimal  data  organization/access  strategy  for  one  database  operation  is  not  always 
best  for  another.  A  further  property  of  the  3D  two-photon  memory  is  that  pages  of  data  can  be 
retrieved  from  it  in  a  number  of  directions;  potentially  allowing  for  a  system  in  which  a  user  can 
use  different  memory  access  approaches  for  different  operations.  Recently  it  has  been  suggested 
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that  accessing  a  3D  two-photon  memory  in  two  orthogonal  directions  might  be  beneficial  for 
relational  database  operations  In  this  report,  we  describe  a  particular  3D  data  organization 
strategy  for  a  relational  database  system  which  takes  advantage  of  this  accessing  approach  and 
evaluate  its  performance  as  a  function  of  system  parameters  using  the  Wisconsin  benchmark 

Our  analysis  shows  that  3D  parallel  access  optical  memories,  in  general,  are  well  suited 
for  large  relational  database  systems  with  low  write  requirements  because  of  their  potential 
capacity  and  throughput.  The  bi-orthogonal  accessing  approach  used  in  conjunction  with  the 
proposed  data  organization  strategy  is  found  to  be  additionally  advantageous.  The  time  required 
to  retrieve  data  for  a  projection  operation,  for  example,  is  close  to  the  optimum  value.  This  is 
likely  not  be  the  case  for  a  system  with  traditional  page  access  and  employing  a  data  organization 
strategy  optimized  for  another  operation. 

We  begin  this  report  with  an  introduction  of  relational  database  operations  and  their 
memory  accessing  requirements.  Following  this  we  give  a  brief  description  of  the  subset  of  data 
filtering  operations  from  the  Wisconsin  benchmark  which  we  used  to  evaluate  performance.  In 
Section  2,  we  provide  a  description  of  the  bi-orthogonal  memory  itself  and  the  proposed  3D  data 
organization  strategy.  We  also  identify  problems  such  as  memory  fragmentation  which  can  lead 
to  degraded  performance  in  real  systems.  In  Section  3,  we  analyze  the  potential  performance  of 
a  bi-orthogonal ly  accessed  3D  two-photon  memory,  first  considering  the  performance  in  an  ideal 
system,  and  then  we  introduce  the  effect  of  the  problems  described  in  Section  2.  In  Section  4, 
we  compare  the  potential  performance  of  a  3D  two-photon  memory  to  that  obtained  with  other 
proposed  or  existing  relational  database  systems. 

The  favorable  results  of  this  study  lead  to  another  study  in  which  the  feasibility  of 
building  an  optoelectronic  data  filter  to  interface  with  this  memory  was  investigated.  In 
particular,  power  consumption,  volume  and  packaging  requirements  were  considered. 

2.  OPERATION  AND  SYSTEM  DESCRIPTION 
2.1  Relational  database  operations 

In  relational  databases,  the  data  records,  also  called  tuples,  are  grouped  into  sets  called 
relations.  All  records  belonging  to  a  relation  must  be  unique.  Figure  2-1  depicts  a  relation 
comprised  of  student  records.  The  columns  are  termed  fields  or  attributes,  and  the  rows  are  the 
records.  Relational  database  operations  can  be  categorized  into  two  groups:  set  operations  and 
relation-oriented  operations  ^ .  Some  of  these  operations  are  performed  on  data  from  a  single 
relation.  Others  combine  data  from  several  relations.  The  result  of  an  operation  is  always  a 
relation. 

If  the  data  in  a  memory  supporting  bi-orthogonal  access  is  arranged  in  a  manner  similar 
to  that  shown  in  Figure  2-1,  a  field  or  set  of  fields  can  be  retrieved  in  one  accessing  direction 
without  retrieving  other  fields  that  are  not  of  interest.  In  the  other  accessing  direction,  a  record 
or  set  of  records  can  be  retrieved  without  retrieving  other  records  that  are  not  of  interest.  These 
two  particular  accessing  features  which  we  term  record  parallel  and  field  parallel  access  are 
particularly  beneficial  for  relational  database  operations  as  they  allow  a  user  to  better  isolate  the 
data  desired  for  a  given  operation.  We  shall  explore  this  further  as  we  give  examples  of  some 
relational  database  operations. 
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Relation  1:  Student  Records 


student  ID 

Name 

GPA 

Major 

Address 

07839120 

B. Jones 

4.0 

economics 

9100  Regents  Rd,  SD  CA  92037 

07839121 

T.  Slash 

3.7 

engineering 

221  Donex  Ave,  SD  CA  92117 

07839122 

R.  Kelsey 

2.6 

philosophy 

3007  Ivy  St,  SDCA  92410 

07839123 

K.  Smith 

2.8 

economics 

102  Hall  St,  SD  CA  92109 

1  07839124 

S.  Jensen 

3.2 

biology 

1765  Filbert  St,  SD  CA  92116  J 

1 

record 

Figure  2-1 :  An  example  of  a  relation  which  consists  of  student  records.  The  rows  of  the  table 
are  termed  records  or  tuples.  The  columns  are  fields  or  attributes. 

A  relation  can  be  thought  of  as  a  set,  and  its  records  as  elements.  The  set  operations  are 
the  traditional  set  operations;  union,  difference  and  Cartesian  product.  These  are  not  described 
here.  Descriptions  can  be  found  in  a  standard  database  text  The  basic  relation-oriented 
operations  are;  selection,  projection,  and  join. 

The  selection  operation,  shown  in  Figure  2-2,  is  performed  on  data  from  a  single  relation. 
It  involves  choosing  records  from  a  relation  based  on  the  value  of  a  particular  field  or  set  of 
fields.  <,  >,  >,  <,  =  and  Boolean  complement  are  all  valid  selection  operations.  To  perform  this 
operation,  records  satisfying  the  query  must  be  identified  and  retrieved.  Frequently  this  can 
involve  retrieving  and  examining  all  records  which  can  take  an  enormous  amount  of  time  in 
large  databases.  If  only  a  few  records  satisfy  the  selection  query,  the  amount  of  data  that  needs 
to  be  retrieved  for  this  operation  can  be  significantly  reduced  if  the  operand  is  retrieved  using 
field  parallel  access,  and  the  records  determined  to  satisfy  the  selection  query  are  retrieved  using 
record  parallel  access.  In  some  instances,  the  value  of  the  selection  operand  may  directly 
correspond  to  its  memory  address.  In  this  case,  record  parallel  access  might  be  preferable. 

Selection  Example;  Which  students  have  a  GPA  of  3.0  or  higher? 


student  ID 

Name 

GPA 

Major 

Address 

07839120 

B.  Jones 

4.0 

economics 

9100  Regents  Rd,  SD  CA  92037 

07839121 

T.  Slash 

3.7 

engineering 

221  Donex  Ave,  SD  CA  92117 

07839124 

S.  Jensen 

3.2 

biology 

1765  Filbert  St,  SD  CA  92116 

Figure  2-2;  Relation  resulting  from  selection  query  STUDENT  RECORDS(GPA  >  3.0).  <,  >, 
>,  <,  =  and  Boolean  complement  are  all  valid  selection  operations. 

With  a  projection  operation,  all  the  data  from  a  relation  which  belongs  to  a  single  field  or 
set  of  fields  is  extracted  to  form  a  new  relation  with  fewer  fields.  True  projection  also  involves 
duplicate  removal;  otherwise,  the  result  would  not  be  a  relation.  Once  again,  if  the  relation  is 
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large,  the  process  of  extracting  the  data  for  this  operation  can  be  quite  time  consuming.  If  record 
parallel  access  was  used  to  perform  this  operation  all  the  data  in  the  relation  would  have  to  be 
retrieved.  If  field  parallel  access  were  used  instead,  the  amount  of  data  that  needs  to  be  retrieved 
can  be  significantly  reduced  since  the  desired  field  or  set  of  fields  can  be  obtained  directly. 

Projection  Example:  What  Majors  do  students  have? 


economics 

economics 

engineering 

duplicate  removal 

engineering 

philosophy 

- ^ 

philosophy 

economics 

biology 

biology 

Figure  2-3:  Relation  resulting  from  the  projection  operation  STUDENT  RECORDS(Major). 
With  projection,  a  single  field  or  several  fields  are  extracted  from  a  relation.  Duplicate  removal 
is  necessary  to  guarantee  that  records  in  the  resulting  relation  are  unique. 

A  join  operation  combines  relations.  To  illustrate  join  a  second  relation,  depicted  in 
Figure  2-4,  is  introduced  which  contains  records  corresponding  to  companies  that  might  hire 
students.  The  result  of  a  join  operation  is  a  subset  of  the  Cartesian  product  of  the  two  relations. 
Only  those  elements  in  the  Cartesian  product  which  satisfy  the  join  query  are  in  the  join  result. 
As  with  selection  <,  >,  >,  <,  =  and  Boolean  complement  are  all  valid  join  operations. 

Relation  2:  Companies  hiring  students 


Company 

Number  of 
positions  open 

Target 

Major 

ABC  investments 

0 

econonics 

Advanced  Technology 

'  0 

engineering 

XYZ  BioTech 

0 

biology 

American  Banking 

0 

enonomics 

Burger  King 

1 

philosophy 

Figure  2-4;  Relation  containing  companies  potentially  hiring  students. 

Given  the  I/O  and  computational  demands  of  the  above  operations,  there  are  several 
approaches  used  to  improve  performance.  Operations  executed  by  a  relational  database  machine 
are  frequently  complex,  that  is,  they  involve  several  basic  operations.  Selecting  student-company 
pairs  from  the  previous  join  example  where  companies  have  positions  open  is  an  example  of  a 
complex  operation.  Significant  performance  improvement  can  be  gained  by  reordering 
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operations  in  complex  operations  In  the  above  example,  performing  the  selection  first  would 
reduce  the  complexity  of  the  join. 

Join  Example: 

Which  students  should  be  paired  with  which  companies  based  on 
their  majors? 


Student  ID 

Name 

GPA 

Major 

JI- 

Number  of 
positions 

Company 

07839120 

B.  Jones 

4.0 

economics 

nr 

0 

ABC  investments 

07839121 

T.  Slash 

3.7 

engineering 

0 

Advanced  Technology 

07839123 

K,  Smith 

2.8 

economics 

0 

ABC  investments 

07839124 

S.  Jensen 

3.2 

biology 

0 

XYZ  BioTech 

07839120 

B. Jones 

4.0 

economics 

0 

American  Banking 

07839123 

K.  Smith 

2.8 

economics 

0 

American  Banldng 

07839122 

R.  Kelsey 

2.6 

philiosophy 

1 

Burger  King 

Figure  2-5:  Example  of  a  join  operation.  Company  records  are  appended  to  student  records  if 
the  student’s  major  equals  the  company’s  target  major. 

Another  way  to  improve  performance,  is  to  execute  an  operation  in  parallel.  In  addition 
to  requiring  parallel  hardware,  this  approach  requires  increased  memory  bandwidth.  In  current 
systems  this  bandwidth  is  achieved  with  an  array  of  serial  disks,  but  it  can  also  be  achieved  with 
parallel  access  storage. 

2.2  Database  data  filter 

Performance  can  also  be  gained  if  one  filters  out  data  deemed  irrelevant  to  a  particular 
query  immediately  when  it  is  retrieved  from  secondary  storage.  A  device  providing  this 
functionality  is  referred  to  as  a  data  filter.  This  study  assumes  that  an  optoelectronic  data  filter  is 
used  in  conjunction  with  the  3D  two-photon  memory.  A  number  of  other  optoelectronic 
database  data  filters  have  been  proposed  The  data  filter  considered  in  this  study  is 

assumed  to  support  operations  that  can  easily  be  done  on  the  fly  selection,  and  the  operations 
projection  and  selection-projection,  both  without  duplicate  removal.  To  further  improve 
performance,  some  data  filters  transform  or  reorganize  the  retrieved  data  in  addition  to  reducing 
it.  The  complexity  of  an  equijoin  operation  (a  join  where  the  comparison  operation  checks  for 
equality),  for  example,  can  be  reduced  if  the  two  relations  are  hashed  on  the  field  over  which, 
they  are  being  joined  Data  transformation  and  reorganization  were  not  considered  in  this 
study.  Filters  providing  this  functionality  could,  however,  be  added. 
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parallel  processing  chip 


Figure  2-6:  3D  optical  memory  with  optoelectronic  database  data  filter.  The  data  filter  consists 
of  a  compander  which  is  flip-chip  bonded  onto  an  electronic  processing  chip.  The  compander  is 
a  modified  CCD  array  which  detects  pages  retrieved  from  the  memory  and  interfaces  this  data 
with  the  electronic  processing  chip.  The  specifics  of  the  compander  and  the  electronic 
processing  chip  will  be  described  in  later  sections. 

The  data  filter  that  was  assumed  in  this  study  is  shown  in  Figure  2-6.  It  consists  of  a 
compander  which  is  flip-chip  bonded  onto  an  electronic  chip  capable  of  performing 
comparison  and  data  masking  operations  in  parallel.  The  compander  is  a  modified  CCD  array 
which  is  used  to  detect  the  data  from  the  memory  and  interface  it  with  the  electronic  processing 
chip.  It  will  be  described  in  more  detail  in  later  sections. 

2.3  Benchmark  description 

Numerous  benchmarks  have  been  proposed  to  evaluate  database  systems  ’®.  We  rely  on 
the  Wisconsin  benchmark,  as  it  is  commonly  used  to  evaluate  the  performance  of  parallel 
relational  database  machines  including  optoelectronic  ones.  For  clarity,  portions  of  the 
benchmark  that  were  used  are  briefly  described.  The  benchmark  allows  databases  of  different 
sizes  to  be  evaluated  by  scaling  the  size  of  relations.  The  operations  considered  in  this  study  are 
performed  on  the  same  sized  relation.  However,  other  relations  would  exist  in  the  memory  at  the 
same  time. 

The  performance  of  the  selection  operation  is  examined  for  different  selectivities.  The 
selectivity  of  an  operation  refers  to  the  number  of  records  that  satisfy  a  selection  query.  A 
relative  selectivity  of  10%  means  that  10%  of  the  records  in  the  relation  satisfy  the  query.  An 
absolute  selectivity  of  100  records  means  that  100  records  satisfy  the  query  regardless  of  the 
relation's  size.  The  Wisconsin  benchmark  also  requires  that  the  performance  of  selection 
operations  be  measured  using  different  types  of  indexing  as  well  as  no  indexing.  A  relation  with 
an  indexed  field  is  organized  in  some  way  (B-tree,  hashing... etc.)  according  to  the  value  of  a 
particular  field  or  set  of  fields;  an  index  key  is  assigned  to  each  record  based  on  this  value.  With 
clustered  indexing,  the  index  key  determines  the  physical  location  of  the  data.  With  non- 


6 


clustered  indexing,  the  index  key  is  used  to  determine  the  address  of  a  record  or  set  of  records. 
Non-clustered  indexing  is  not  always  preferable  to  no-indexing.  Insertion  of  records  has  a  much 
higher  cost  with  indexing.  Also,  for  selection  operations  with  moderately  high  selectivity  (1%  - 
10%),  a  crude  scan  of  all  the  records  (selection  with  no  indexing)  sometimes  is  comparable  in 
performance  or  even  preferable  to  non-clustered  indexing  The  Wisconsin  benchmark 
considers  the  performance  of  the  different  types  of  selection  operations  for  a  discrete  set  of 
selectivities:  an  absolute  selectivity  of  one,  and  relative  selectivities  of  1%  and  10%.  It  does  not 
require  selection  with  non-clustered  indexing  or  no  indexing  for  an  absolute  selectivity  of  one.  It 
also  does  not  require  selection  with  non-clustered  indexing  for  a  selectivity  of  10%. 

In  the  sections  that  follow,  the  performance  of  various  selection  and  projection  operations 
is  examined  for  a  bi-orthogonally  accessed  3D  optical  memory  system.  The  specific  operations 
considered  in  this  study  are  the  same  as  those  in  the  Wisconsin  benchmark  except  for  the 
selectivity  in  selection  operations,  and  the  operand  size  in  the  projection  operation,  both  of  which 
were  varied  continuously  instead  of  being  assigned  discrete  values  First,  though,  we  briefly 
describe  the  memory  and  the  proposed  data  organization  scheme  defining  terms  that  shall  be 
used  for  the  duration  of  the  report. 

2.4  Bi-orthogonally  accessed  3D  two-photon  memory 

A  parallel  access  3D  two-photon  memory  is  a  random  access  memory  in  which  bits  are 
suspended  throughout  the  volume  of  a  cube  or  rectangular  solid,  each  occupying  a  unique 
physical  location.  Data  are  written  via  two  photon  excitation  by  intersecting  two  laser  beams  of 
different  wavelength  at  any  location  within  the  memory.  An  entire  page  of  bits  can  be  written  by 
intersecting  a  spatially  modulated  information  beam  with  an  addressing  beam  as  shown  in  Report 
4.  A  page  written  in  this  way  can  be  retrieved  by  re-illuminating  it  with  an  identically  positioned 
addressing  beam,  shown  in  Figure  2-7.  The  potential  capacity  of  these  memories  is  high  (1 
Tbit/cm^)  ^  because  bits  are  stored  in  a  volume  as  opposed  to  on  a  planer  surface.  Throughput  is 
also  high  (1  Tbit/sec)  because  the  pages  themselves  can  contain  10^  bits  and  be  accessed  in  a 
relatively  short  amount  of  time.  These  particular  memories  can  also  be  accessed  in  orthogonal 
directions,  as  shown  in  Figure  2-8.  The  ability  to  retrieve  planes  in  two  orthogonal  directions  is 
termed  bi-orthogonal  access. 


Figure  2-1  \  A  page  of  bits  can  be  retrieved  from  a  3D  two-photon  memory  by  illuminating  it 

with  a  plane  of  light. 
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Figure  2-8;  Data  is  organized  in  a  bi-orthogonally  accessed  3D  two-photon  memory  such  that 
complete  records  are  mapped  onto  y-z  planes.  These  records  are  arranged  on  these  planes  such 
that  data  belonging  to  the  same  field  are  on  the  same  x-y  plane.  Accessing  a  y-z  plane  in  parallel 
is  referred  to  as  record  parallel  access.  Accessing  an  x-y  plane  is  referred  to  as  field  parallel 

access. 


2.5  Memory  mapping 

The  data  organization  scheme  that  is  proposed  is  explained  with  reference  to  Figure  2-8. 
Complete  records  are  mapped  onto  the  memory  so  that  they  are  contained  on  a  single  y-z  plane. 
These  records  are  arranged  on  these  planes  such  that  data  belonging  to  the  same  field  or  set  of 
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fields  are  contained  on  the  same  x-y  plane.  Retrieving  a  y-z  page  containing  complete  records 
results  in  record  parallel  access.  Retrieving  an  x-y  page  results  in  field  parallel  access.  With 
record  parallel  access,  a  small  set  of  records  can  be  retrieved  in  parallel  in  one  memory  read 
retrieved  without  retrieving  other  records  that  are  not  of  interest.  With  field  parallel  access,  a 
particular  field  or  set  of  fields  can  be  retrieved  without  retrieving  other  fields  that  are  not  of 
interest. 

Performance  is  improved  by  choosing  the  best  accessing  method  for  a  given  operation. 
Usually  projection  is  best  accomplished  with  field  parallel  access  so  that  only  the  desired  field  or 
set  of  fields  needs  to  be  retrieved.  This  readout  mode  is  also  advantageous  for  certain  selection 
operations  as  it  provides  an  efficient  means  of  scanning  through  a  set  of  records  to  determine 
which  records  satisfy  a  particular  selection  query;  only  the  operand  on  which  the  selection 
operation  is  based  needs  to  be  retrieved.  If  it  is  determined  that  a  record  satisfies  the  selection 
query,  it  can  be  retrieved  in  one  page  read  using  record  parallel  access.  To  read  out  a  single 
record  using  field  parallel  access  would  require  many  more  page  reads.  In  the  sections  that 
follow,  the  effect  that  the  two  accessing  approaches  have  on  performance  is  examined. 

Although  physically  the  memory  is  roughly  a  cube,  for  reasons  described  in  reference 
the  bits  in  the  memory  are  not  cubes;  thus  the  memory  has  different  linear  capacities  in  different 
directions.  In  this  report,  however,  the  stored  bits  are  represented  as  cubes  and  the  memory  as  a 
rectangular  solid  with  the  long  side  corresponding  to  a  direction  with  a  higher  linear  data  density. 
The  memory  is  divided  into  bit  cubes  called  super-blocks,  shown  in  Figure  2-9.  A  super-block 
can  be  viewed  as  a  sequence  of  pages  that  can  be  accessed  randomly  in  either  of  two  orthogonal 
directions.  Each  super-block  contains  bits  and  has  M  bits  on  a  side;  thus  pages  read  from 
these  super  blocks  contain  bits.  The  time  required  to  access  a  page  is  t. 

The  data  organization  scheme  that  is  proposed  is  illustrated  in  Figure  2-10.  Records  are 
arranged  on  y-z  planes,  so  that  each  x-y  plane  contains  w  bits  of  a  record.  It  is  assumed  that  w 
divides  M.  The  parameter  w  affects  both  capacity  and  performance,  and  its  effect  will  be 
examined  throughout  the  paper.  For  example,  it  affects  the  time  required  to  retrieve  a  record 
using  field  parallel  access;  approximately  r/w  page  reads  are  required  to  retrieve  a  record  of  size 
r.  We  refer  to  the  set  of  r/w  x-y  pages  needed  to  contain  a  complete  record  as  a  block. 
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record  parallel 
access 


Figure  2-9:  The  memory  is  divided  into  bit  cubes  called  super-blocks.  A  super-block  can  be 
viewed  as  a  sequence  of  pages  that  can  be  accessed  randomly  in  either  of  two  orthogonal 
directions  from  any  super-block  in  time  t  Each  super-block  contains  with  M  bits  on  a  side; 
thus  pages  read  from  these  super  blocks  contain  bits. 
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Figure  2-10:  Records  are  mapped  onto  the  memory  such  that  they  are  contained  on  single  y-z 
planes  and  the  data  on  these  planes  are  arranged  so  each  page  accessed  in  the  field  parallel 
direction  contains  w  bits  of  a  record.  A  complete  record  of  size  r  can  be  accessed  in  one  memory 
read  utilizing  record  parallel  access  or  in  approximately  r/w  page  reads  using  field  parallel 
access.  This  set  of  x-y  pages  is  referred  to  as  a  block. 

With  the  proposed  data  organization  scheme,  there  will  sometimes  be  gaps  in  the  memory 
which  contain  no  data.  This  memory  fragmentation  problem  will  affect  capacity  and  also 
performance,  since  pages  retrieved  from  the  memory  will  not  always  be  full.  Gaps  can  occur  in 
two  ways  and  are  dependent  on  the  value  of  w.  It  is  assumed  that  data  from  multiple  fields  can 
be  contained  on  the  same  field  parallel  page,  and  that  the  data  in  each  group  of  w  bits  may 
belong  to  different  records.  The  first  type  of  gap  is  illustrated  in  Figure  2-11  and  leads  to 
memory  fragmentation  referred  to  as  type  I.  This  particular  kind  of  memory  fragmentation  is 
likely  to  have  little  effect  and  is  neglected  in  this  study. 
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complete  record 


Figure  2-11:  Page  retrieved  with  record  parallel  access  showing  gaps  leading  to  memory 

fragmentation  of  type  1. 


Another  kind  of  memory  fragmentation,  termed  type  II,  is  more  significant.  It  is  further 
assumed  that  a  record  can  always  be  retrieved  with  one  memory  read.  The  problem  which  can 
arise  is  explained  with  reference  to  Figure  2-12.  This  figure  shows  two  pages  retrieved  with 
record  parallel  access,  each  employing  a  different  record  placement  strategy,  i.e.,  different  values 
for  w.  In  this  figure  w’  and  w”  are  chosen  so  that  they  divide  M.  However,  P’  and  P”  which  are 
roughly  r/w’  and  r/w”,  respectively,  do  not  divide  M.  As  a  result,  some  memory  on  each  record 
parallel  page  will  not  be  used.  Correspondingly,  entire  field  parallel  pages  which  are 
perpendicular  to  this  will  be  empty  as  well.  On  average  there  will  be  P/2  such  pages  per  super¬ 
block,  where  P  ~=  r/w  is  the  number  of  pages  in  a  block.  The  capacity  penalty  which  results 
from  this,  on  average  increases  when  w  is  reduced.  This  fragmentation  problem  also  affects  data 
retrieval  with  record  parallel  access;  with  field  parallel  access,  the  empty  pages  could  be  avoided. 
We  use  parameter  a  to  represent  the  data  retrieval  efficiency  due  to  memory  fragmentation  when 
record  parallel  access  is  used.  In  Figure  2-12,  it  is  the  ratio  of  the  area  of  the  shaded  region  over 
the  area  of  the  entire  page.  Mathematically  this  can  be  expressed  as: 


(0.1) 
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unused  memory  P"  ~=  r/w" 

Figure  2-12:  Memory  fragmentation  of  type  n  can  reduce  capacity,  and  bit  retrieval  efficiency 
with  record  parallel  access.  Two  pages  retrieved  with  record  parallel  access  are  shown,  w’  and 
w”  are  chosen  so  that  they  divide  M.  It  is  likely  that  P’  and  P”  won’t  divide  M.  As  a  result, 
some  memory  on  each  record  parallel  page  will  not  be  used.  Correspondingly,  entire  field 
parallel  pages  which  are  perpendicular  to  this  will  be  empty  as  well.  The  capacity  penalty  which 
results  from  this,  on  average  increases  when  w  is  reduced. 

Another  factor  that  can  have  an  even  greater  impact  on  performance  is  packing.  A 
relation  may  not  completely  fill  all  the  super-blocks  in  which  it  is  contained;  it  is  also  likely  to 
start  and  end  in  the  middle  of  a  super-block.  As  a  result  of  this,  the  first  and  last  super-blocks 
will  usually  contain  other  relations  or  be  empty.  Records  residing  in  these  partially  filled  super¬ 
blocks  can  be  placed  such  that  they  fill  field  or  record  parallel  pages  first.  This  is  shown  in 
Figure  2-13.  In  general  if  field  parallel  pages  are  filled  first,  the  time  to  perform  operations  using 
record  parallel  access  will  approximately  increase  by  a  factor  of  1/B,  where  B  is  the  minimum 
number  of  super-blocks  required  for  the  operation.  This  is  because  on  average  two  additional 
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half  super-blocks  of  data  will  have  to  be  read  that  do  not  contain  the  relation  of  interest.  The 
time  required  to  perform  operations  using  field  parallel  access  is  not  affected.  If,  on  the  other 
hand,  record  parallel  pages  are  filled  first,  the  record  parallel  access  time  is  not  affected,  but  the 
field  parallel  access  time  is  increased  by  a  factor  of  1/B.  The  effect  of  adverse  packing  decreases 
for  larger  relations,  but  can  be  significant  for  small  relations  when  there  are  few  super-blocks  and 
B  is  small. 


a) 


Figure  2-13:  If  field  parallel  pages  are  packed  first,  a),  some  pages  retrieved  with  record  parallel 
access  will  contain  data  from  other  relations.  If  record  parallel  pages  are  packed  first,  b),  field 

parallel  access  can  be  adversely  affected. 
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3,  PERFORMANCE  STUDY 


3.1  Performance  study  parameters 

In  this  section  we  analyze  the  potential  performance  of  a  hi -orthogonally  accessed  3D 
optical  memory  for  a  subset  of  relational  database  operations  commonly  used  in  data  filters.  The 
particular  operations  that  were  considered  were  projection  and  selection  with  and  without 
mixing.  We  begin  by  first  looking  at  the  effect  that  w  and  the  accessing  method  have  on 
performance  in  an  ideal  situation,  neglecting  the  effect  of  memory  fragmentation  and  packing 

Later  we  consider  the  effect  that  these  two  factors  can  have. 

For  clarity  the  parameters  and  values  that  are  used  in  the  equations  and  graphs  in  this  and 
the  following  section  are  summarized  in  Table  3-1.  When  numerical  results  are  given  it  is 
assumed  that  the  relation  contains  10«  records.  This  particular  relation  size  was  chosen  to 
facilitate  companson  with  other  existing  or  proposed  relational  database  machines.  A  single  two 
photon  memory  could  hold  a  much  larger  sized  relation. 

3.2  Ideal  projection 

The  projection  operation  requires  that  a  field  or  set  of  fields  be  extracted  from  a  relation 
to  form  a  new  relation.  This  operation  also  requires  duplicate  removal.  In  our  analysis,  we  only 
considered  the  time  that  would  be  required  to  retrieve  the  data  for  the  operation  not  the  time  for 
duplicate  removal  as  it  would  have  required  the  analysis  of  a  complete  database  system.  The 
times  given  for  projection  in  this  section,  therefore,  cannot  be  compared  to  times  obtained  for 
other  systems  using  the  Wisconsin  benchmark,  since  the  benchmark  which  requires  duplicate 

The  projection  operation  is  almost  always  best  performed  with  field  parallel  access  since 
only  the  desired  field  or  set  of  fields  needs  to  be  retrieved.  With  record  parallel  access  the  entire 
relation  always  needs  to  be  retrieved.  The  time  required  to  retrieve  the  data  necessary  for  a 
projection  operation  with  record  parallel  access  is  given  below.  It  is  simply  the  time  required  to 
retrieve  a  single  page,  t,  multiplied  by  the  number  of  record  parallel  pages  needed  to  store  the 
relation.  In  this  equation  and  equations  that  follow,  “[  J  "denotes  the  integer  less  than  or  equal 

to  the  operand  in  brackets  and  “fj”  denotes  the  integer  greater  than  or  equal  to  the  operand  in 
brackets. 


The  time  required  to  retrieve  the  data  using  field  parallel  access  is  given  in  equation  0  3 
assuming  the  desired  field  or  set  of  fields  is  p  bits.  The  first  two  terms  represent  the  time’ 
required  to  retneve  the  p  bits  from  a  single  block  in  the  memoiy.  The  last  term  is  the  number  of 
blocks  needed  for  the  relation. 


JP 

- 1 

_ 1 

w 

(0.3) 
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parameter 

description 

value  or  expression 

R 

number  of  records  in 
relation 

10  6 

r 

record  size 

(208x8)  bits 

memory  page  size 

1024^  =  10  6  bits 

M 

number  of  memory  pages 
in  a  super-block 

1024 

t 

time  to  access  a  memory 
page 

10  usee 

w 

field  parallel  word  size 

variable 

a 

page  retrieval  efficiency 
due  to  fragmentation  of 
type  II 

1 

B 

minimum  number  of 
super-blocks  required  to 
store  the  relation 

'  rR  ' 
aM^ 

S  2 

N 

number  of  records 
satisfying  the  selection 
query 

variable 

f 

operand  size  for  selection 
operation 

32  bits 

P 

operand  size  for 

projection  operation 

variable 

Table  3-1 :  Variables  used  for  performance  calculations 


The  time  required  to  retrieve  the  data  for  a  projection  operation  with  the  two  accessing 
methods  is  plotted  in  Figure  3-14  as  a  function  of  the  operand  size.  The  operand  size  is 
augmented  in  byte  increments  even  though  this  could  physically  correspond  to  reading  out 
fractions  of  fields.  The  superior  performance  of  field  parallel  access  can  clearly  be  seen.  For 
small  operand  sizes  and  small  values  of  w,  the  time  required  to  retrieve  the  data  with  field 
parallel  access  can  require  a  factor  of  50  less  time  than  with  record  parallel  access.  The  jagged 
appearance  of  the  field  parallel  trace  results  from  the  fact  that  the  operand  size  p  is  not  always  a 
multiple  of  w  bits.  The  time  required  to  perform  this  operation  with  field  parallel  access  is 
minimized  when  p  is  a  multiple  of  w  which  is  more  likely  to  occur  when  w  is  small. 
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percentage  of  record  desired 


Figure  3-14;  Time  to  retrieve  the  data  required  for  a  projection  operation.  With  field  parallel 
access  only  the  desired  field  or  set  of  fields  needs  to  be  retrieved.  With  record  parallel  access, 
the  entire  relation  needs  to  be  retrieved.  This  graph  neglects  seek  time  and  the  effects  of  packing 

and  memory  fragmentation. 


3.3  Ideal  selection  with  clustered  indexing 

The  performance  of  selection  with  clustered  indexing  was  also  considered.  With  this 
operation,  the  location  of  the  records  is  known  a  priori,  and  the  records  satisfying  the  selection 
criterion  are  assumed  to  be  adjacent.  Two  different  data  ordering  schemes  are  assumed.  With 
record  parallel  access,  consecutive  records  are  on  the  same  record  parallel  page,  while  with  field 
parallel  access,  consecutive  records  are  on  the  same  field  parallel  page.  The  Wisconsin 
benchmark  requires  that  the  time  required  to  return  and  format  the  result  to  the  user  be  included 
for  this  particular  selection  operation.  This  overhead  is  not  included  in  this  analysis  —  only  the 
time  required  to  retrieve  the  desired  records. 

In  principal  it  is  best  to  perform  this  operation  with  record  parallel  access.  In  practice, 
when  one  includes  the  effect  of  memory  fragmentation  and  packing,  this  may  not  be  the  case. 
The  time  to  perform  this  operation  with  record  parallel  access  is  given  below,  assuming  that  N  is 
the  number  of  records  which  satisfy  the  selection  query.  The  time  is  roughly  equal  to  the  page 
access  time,  t,  multiplied  by  N  divided  by  the  number  of  records  on  a  page  (~M^  If).  There  is  a 
good  chance  that  the  set  of  consecutive  selected  records  will  start  and  end  in  the  middle  of  a 
record  parallel  page,  and  that  an  additional  page  read  will  be  required.  This  is  taken  into 
consideration  in  equation  0.4. 


(0.4) 


The  expression  for  the  time  required  to  perform  this  operation  with  field  parallel  access  is 
similar  and  is  given  below.  With  field  parallel  access,  r/w  page  reads  are  required  to  retrieve  a 
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complete  record.  This  set  of  pages  as  mentioned  earlier  is  referred  to  as  a  block.  The  time  to 
perform  this  operation  with  this  form  of  access  is  roughly  equal  to  the  amount  of  time  required  to 
readout  a  block,  t  r/w,  multiplied  by  the  number  of  blocks  which  would  be  needed  to  contain  the 
N  selected  records:  N  w/M^ .  Equation  0.5  also  includes  the  probability  that  an  additional  block 
may  need  to  be  retrieved. 


^(N-1)  W 

+  1 

J 


(0.5) 


The  time  required  to  perform  selection  with  no  indexing  is  plotted  in  Figure  3-15  as  a 
function  of  relative  selectivity  for  the  two  different  accessing  modes.  As  mentioned  earlier,  the 
set  of  contiguous  selected  records  will  occupy  a  certain  minimum  number  of  pages  in  the  case  of 
record  parallel  access  or  blocks  in  the  case  of  field  parallel  access  and  be  likely  to  start  and  end 
in  the  middle  of  a  page/block,  thereby  requiring  an  additional  page^lock  read.  The  overhead  of 
having  to  read  an  additional  block  with  field  parallel  access  is  greater  than  the  overhead  of 
retrieving  an  additional  page  with  record  parallel  access.  Thus,  under  ideal  conditions,  this 
operation  is  best  performed  with  record  parallel  access.  With  field  parallel  access,  the  overhead 
is  larger  when  there  are  more  pages  in  a  block  (when  w  is  small).  Performing  very  low 
selectivity  operations,  such  as  retrieving  a  single  record,  is  also  best  performed  with  record 
parallel  access.  With  this  strategy  only  one  page  read  is  necessary,  while  with  field  parallel 
access  an  entire  block  would  have  to  be  retrieved. 


relative  selectivity  (in  percent) 

Figure  3-15:  Time  to  perform  selection  with  clustered  indexing  neglecting  the  effect  of  seek 

time,  packing  and  memory  fragmentation. 


3.4  Ideal  selection  with  no  indexing 

The  performance  of  selection  with  no  indexing  was  also  considered.  With  this  operation, 
the  locations  of  the  records  satisfying  the  selection  query  are  not  known  a  priori,  so  all  the 
records  have  to  be  searched  to  determine  which  ones  satisfy  the  selection  query.  Records  which 
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satisfy  the  query  would  then  be  retrieved.  It  is  assumed  that  selected  records  are  uniformly 
distributed  throughout  the  memory. 

Ideally  the  accessing  method  that  should  be  used  for  this  operation  depends  on  the 
selectivity  of  the  operation.  Very  low  selectivity  operations  should  be  performed  by  searching 
using  field  parallel  access,  and  retrieving  selected  records  with  record  parallel  access.  With 
moderately  high  selectivity  operations,  all  pages  will  have  to  be  retrieved  so  either  field  or  record 
parallel  access  could  be  used. 

In  general,  if  this  operation  is  performed  using  record  parallel  access,  all  record  parallel 
pages  containing  the  relation  will  have  to  be  retrieved  so  that  each  record  can  be  examined  to 
determine  which  ones  satisfy  the  selection  query.  Thus,  the  time  required  to  perform  this 
operation  with  record  parallel  access  is  the  same  for  all  selectivities,  and  is  the  same  as  the  time 
to  retrieve  records  for  a  projection  operation  with  record  parallel  access. 


w 

Tsearch  On  msec) 

2^  bits 

.32 

2^  bits 

.31 

2^  bits 

.62 

2^  bits 

1.23 

Table  3-2:  Effect  of  w  on  Tsearch.  Tjearch  increases  when  w  is  larger  than  f  In  this  example  f  is 

32  bits. 

This  operation  can  also  be  performed  with  field  parallel  access  or  by  combining  field  and 
record  parallel  access.  The  first  accessing  approach  is  termed  field-field  and  the  later  field- 
record.  With  the  two  approaches,  the  search  part  of  the  operation  is  performed  using  field 
parallel  access;  only  the  operand  containing  field(s)  is/are  retrieved.  Records  satisfying  the 
selection  query  are  read  out  using  record  parallel  access  (for  the  field-record  approach)  or  using 
field  parallel  access  (for  field-field).  The  search  part  of  the  operation  requires  time  Tsearch  i  an 
expression  is  given  in  equation  0.8  and  tabulated  for  different  values  of  w  in  Table  3-2.  It  is 
assumed  that  [f/w]  field  parallel  page  reads  are  needed  for  every  block,  where  f  is  the  size  of 

the  selection  operand.  (There  is  a  probability  that  it  could  require  one  more  page  read  than  this.) 
The  searching  time  is  inefficient  when  w  is  a  much  larger  than  f  The  time  required  to  retrieve 
the  selected  records  is  referred  to  as  the  readout  time.  The  time  to  read  out  records  satisfying  the 
selection  query  is  denoted  as  Tr^r  if  record  parallel  access  is  used  and  as  Trof  if  field  parallel 
access  is  used.  With  this  notation,  the  expression  for  the  average  time  to  perform  ideal  selection 
with  no  indexing  and  field-record  access  is  given  below  assuming  that  N  records  satisfy  the 
selection  query. 

Tsearch  +  T^or  (0.6) 

where. 
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(0.7) 


(0.8) 


In  the  equation  for  Tror,  the  quantity  raised  to  the  N*  power  is  the  probability  that  apage 
is  empty  given  N  records  are  selected.  One  minus  this  quantity  raised  to  the  N"’  power  is  the 
probability  that  a  page  contains  at  least  one  selected  record  and  needs  to  be  read  out.  It  should  be 
noted  that  the  selection  operand  is  readout  twice  with  this  approach  during  both  the  searching 
and  readout  portions  of  the  operation.  The  first  two  terms  in  the  expression  for  Tsearch  represent 
the  time  required  to  read  out  the  operand  from  a  single  block.  The  second  term  is  the  number  of 
blocks  in  the  memory. 

The  average  time  to  perform  selection  with  no  indexing  using  field-field  access  is  given 
below.  The  expression  for  Trof  is  very  similar  to  the  expression  for  Tror-  The  quantity  raised  to 
the  N‘'’  power  is  the  probability  that  a  block  is  empty  given  that  N  records  are  selected.  One 
minus  this  quantity  raised  to  the  N*  power  is  the  probability  that  a  block  has  a  selected  record  on 
it  and  will  need  to  be  retrieved.  It  is  assumed  that  the  selection  operand  does  not  have  to  be  read 
out  twice  during  the  search  and  readout  portions  of  the  operation.  This  is  taken  into 
consideration  with  the  term  multiplied  by  T^f  in  equation  0.9. 


The  time  to  perform  ideal  selection  with  no  indexing  is  plotted  vs.  selectivity  in  Figure  3- 
16  for  the  three  access  modes.  The  search  time  for  the  field-record  plot  is  assumed  to  be 
.3 1  msec.  The  time  to  perform  this  operation  saturates  for  the  two  field  parallel  based  accessing 
approaches.  With  field-record  access,  this  occurs  when  it  is  likely  that  every  record  parallel  page 
will  have  a  selected  record  on  it  and  will  need  to  be  read  out.  At  this  point  the  fraction  of  records 
satisfying  the  selection  criterion  is  roughly  equal  to  one  divided  by  the  number  of  records  on  a 
page.  For  field-field  access  the  saturation  happens  when  it  is  probable  that  every  block  needs  to 
be  read  out:  when  the  fraction  of  records  satisfying  the  selection  criterion  is  approximately  equal 


20 


to  one  divided  by  the  number  of  records  in  a  block;  when  w  is  smaller  the  saturation  occurs 
sooner. 


relative  selectivity  (in  percent) 

Figure  3-16:  Time  to  perform  selection  with  no  indexing  neglecting  the  effect  of  seek  time  and 
packing.  For  record  parallel  and  field-record  access  the  effect  of  memory  fragmentation  is  not 
included.  The  search  time  for  field-record  access  is  assumed  to  be  .31  msec. 

The  field-record  parallel  readout  approach  can  be  disastrous  depending  on  how  the  search 
and  readout  parts  of  the  operation  are  carried  out  with  respect  to  each  another.  This  problem  is 
explained  with  reference  to  Figure  3-17  which  shows  a  single  super-block.  On  this  diagram,  the 
["f/wlfield  parallel  page  reads  required  to  scan  through  a  single  block,  don’t  provide 

information  about  the  other  records  on  the  record  parallel  page  that  is  completely  visible. 
(b-l)|*f/w1  additional  field  parallel  pages  would  need  to  be  retrieved,  to  determine  the  total 

number  of  records  satisfying  the  selection  query  on  the  visible  page,  where  b  is  the  number  of 
blocks  in  a  super-block.  To  improve  the  performance  of  this  operation  all  blocks  in  a  super¬ 
block  should  be  searched  first  before  attempting  to  read  out  selected  records  using  record  parallel 
access.  If  instead  the  records  are  retrieved  after  each  block  is  searched,  each  record  parallel  page 
may  have  to  be  readout  b  times  —  once  for  each  block. 
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Single  super-block 
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field  parallel 
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direction 


b  blocks 
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Figure  3-17:  For  selection  with  no  indexing  performed  using  field-record  access,  [” f/w"] field 

parallel  page  reads  are  required  per  block  in  the  search  part  of  the  operation.  These  pages  reads 
yield  little  information  about  the  record  parallel  page  that  is  completely  visible  and  those  pages 
behind  it.  Additional  field  parallel  pages  would  need  to  be  retrieved,  to  determine  the  total 
number  of  records  selected  on  the  visible  page,  where  b  is  the  number  of  blocks  in  a  super-block. 
With  this  in  mind,  if  this  operation  is  performed  with  field-record  access,  all  blocks  in  a  super¬ 
block  should  be  searched  first  using  field  parallel  access  before  attempting  to  read  out  the 
selected  records  using  record  parallel  access. 


3.5  Ideal  selection  with  non-clustered  indexing 

With  the  bi-orthogonally  accessed  3D  two-photon  memory,  non-clustered  indexing 
would  only  be  preferable  to  selection  with  no  indexing  if  it  could  eliminate  more  page  reads.  A 
lower  bound  on  the  time  to  perform  selection  with  non-clustered  indexing  is  the  time  that  it  takes 
to  read  out  pages  or  blocks  with  selected  records  from  the  memory  (assuming  no  page  is  read  out 
twice).  This  bound  neglects  the  time  to  determine  where  the  records  are.  The  difference 
between  the  two  operations  is  the  search  time,  Tsearch’ which  is  generally  quite  small.  Selection 
with  non-clustered  indexing  would  only  be  advantageous  for  extremely  low  selectivity 
operations,  when  the  search  time  is  comparable  to  the  readout  time  or  larger. 

3.6  The  effect  of  memory  fragmentation 

At  this  point  we  a  have  considered  the  performance  of  a  bi-orthogonally  accessed  3D 
memory  under  ideal  conditions.  In  real  systems  there  will  be  practical  issues  which  can  cause 
the  performance  to  deviate  from  the  ideal.  On  such  problem  is  memory  fragmentation,  which 
occurs  when  records  fit  poorly  onto  record  parallel  pages.  As  described  in  the  previous  section, 
this  problem  reduces  the  usable  memory  capacity  and  the  bit  retrieval  efficiency  by  a  factor  of  a. 
Correspondingly  the  time  to  perform  operations  or  portions  of  operations  using  record  access  is 
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increased  by  roughly  a  factor  of  (1-1/a).  This  factor  will  generally  be  greatest  when  w  is  small. 
Table  3-3  lists  values  obtained  for  a  and  1/a  for  various  values  of  w.  When  w  is  eight  bits  the 
time  required  to  perform  this  operation  with  record  parallel  access  is  increased  by  23%,  since  1/a 
is  equal  to  1.23.  However,  when  w  is  larger,  there  is  very  little  inefficiency. 


size  of  w 

a 

1/a 

2^  bits 

.81 

1.23 

2^  bits 

.96 

1.04 

t  bits 

.99 

1.01 

2’  bits 

.99 

1.01 

Table  3-3:  Effect  of  w  on  a.  The  overhead  for  storing  a  record  increases  when  w  is  small  due  to 

memory  fragmentation  of  type  II. 

This  particular  problem  in  some  instances  can  make  field  parallel  access  preferable  when 
under  ideal  conditions  record  parallel  access  would  have  been.  For  selection  with  clustered 
indexing,  for  example,  field  parallel  access  with  w  =  2’bits  can  be  preferable  to  record  parallel 
access  if  w  is  small  and  the  selectivity  is  high.  This  is  also  the  case  for  selection  with  no 
indexing  for  high  selectivites. 

3.7  The  effect  of  the  packing  strategy 

Packing  can  have  an  even  greater  impact  on  performance  than  memory  fragmentation. 
As  described  in  the  previous  section,  packing  on  average  extends  the  time  required  for  an 
operation  by  1/B,  where  B  is  the  minimum  number  of  super-blocks  required  to  store  the  relation, 
depending  of  the  accessing  technique  used.  Thus  if  record  parallel  pages  are  packed  first, 
operations  or  portions  of  operations  performed  with  field  parallel  access  will  on  average  require 
1/B  more  time  to  complete.  However,  the  time  to  perform  operations  with  record  parallel  access 
will  not  be  affected.  Likewise  if  record  parallel  pages  are  packed  first  the  time  for  field  parallel 
access  will  similarly  increase,  but  the  time  to  perform  this  operation  with  record  parallel  access 
will  not  be  affected.  In  this  study,  B  is  equal  to  two.  Thus  the  average  time  to  perform 
operations  can  be  increased  by  50%,  depending  on  the  accessing  technique  used.  While  the 
effect  of  adverse  packing  decreases  for  larger  relations,  it  increases  for  smaller  relations. 

3.8  Bit  retrieval  efficiency 

A  metric  termed  bit  retrieval  efficiency  was  devised  to  measure  the  optimality  of 
operations  performed  with  the  bi-orthogonally  accessed  3D  optical  memory.  It  is  defined  as  the 
ratio  of  the  minimum  number  of  bits  required  for  an  operation  to  the  average  number  of  bits  that 
are  retrieved.  It  reflects  the  utilization  of  memory  bandwidth  and  the  random  access  capability 
of  the  memory  device.  The  average  number  of  bits  that  are  retrieved  for  a  given  operation  is 
determined  by  multiplying  the  average  number  of  pages  retrieved  by  the  number  of  bits  on  a 
page.  The  average  number  of  pages  is  computed  by  dividing  the  time  to  perform  the  operation 
by  t,  the  page  access  time 

When  projection  is  performed  with  field  parallel  access  the  bit  retrieval  efficiency  is 
roughly  one,  the  most  optimal.  It  decreases  when  w  is  large,  particularly  if  the  desired  field  is 
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very  small.  It  also  decreases  if  record  parallel  pages  are  packed  first,  by  roughly  a  factor  of 
B/(B+1). 

For  selection  with  clustered  indexing,  the  minimum  number  of  bits  that  would  have  to  be 
retrieved  for  the  operation  is;  rN.  The  bit  retrieval  efficiency  for  selection  with  clustered 
indexing  is  plotted  vs.  selectivity  for  the  different  access  modes  in  Figure  3-18  neglecting  the 
effect  of  packing.  For  most  selectivities,  the  bit  retrieval  efficiency  is  quite  good  and  approaches 
one  (record  parallel  first)  with  increasing  selectivity.  Only  for  very  low  selectivity  operations  is 
this  operation  inefficient,  when,  for  example,  an  entire  page  or  block  has  to  be  retrieved  when 
only  one  record  is  desired. 


■a 

<u 

.tj 

a 

r 

I| 

O 

W  b 

^  sa 

a  b 

a  o 

I 

c  U 
•a  a 
a  a 


«3 

jn 


relative  selectivity  (in  percent) 

Figure  3-18:  bit  retrieval  efficiency  for  selection  with  clustered  indexing.  This  neglects  the 
effect  of  memory  fragmentation,  seek  time  and  packing. 

The  minimum  number  of  bits  required  for  selection  with  no  indexing  can  be  expressed  as: 
f  R-t-  (r  -  f  )N  The  first  term  reflects  the  searching  and  the  second  the  read  out.  It  assumes  the 
selection  operand  is  not  read  out  twice.  The  bit  retrieval  efficiency  for  this  operation  is  plotted  in 
Figure  3-18  neglecting  the  effect  of  packing,  and  assuming  that  selected  records  are  distributed 
randomly  throughout  the  memory.  Performing  this  operation  with  record  parallel  access  is 
inefficient  in  all  but  extremely  high  selectivity  operations,  because  all  record  parallel  pages  have 
to  be  retrieved.  For  the  two  field  parallel  accessing  based  approaches  (field-field  and  field- 
record),  the  searching  part  of  this  operation  is  performed  efficiently.  It  is  the  readout  part  of  the 
operation  that  is  inefficient.  If  selected  records  are  uniformly  distributed  throughout  the 
memory,  it  is  likely  that  pages  and  blocks  will  only  have  a  few  selected  records  on  them,  but  will 
need  to  be  readout.  The  efficiency  improves  as  the  number  of  selected  records  per  page  (or 
block).  For  the  selectivity  range  shown  in  the  graph,  field-record  access  is  most  efficient  for 
very  low  selectivity  operations.  In  this  selectivity  range,  the  searching,  which  is  quite  efficient, 
dominates,  and  the  readout  inefficiency  is  not  noticed. 
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The  bit  retrieval  efficiency  with  a  bi-orthogonally  accessed  3D  optical  memory  is  in 
general  quite  good.  This  is  because  the  accessing  feature  improves  the  random  access  capability 
of  the  memory.  The  graph  of  the  bit  retrieval  efficiency,  Figure  3-18,  for  selection  with  no 
indexing,  provides  insight  into  what  future  developments  in  memory  technology  can  improve  the 
performance  of  database  systems.  With  this  particular  operation,  it  is  likely  that  a  page  or  block 
will  have  a  small  number  of  selected  records  on  it  and  will  need  to  be  read  out.  A  bi- 
orthogonally  accessed  3D  optical  memory  would  outperform  the  one  studied  here  if  its  page  size 
were  reduced  and  its  bandwidth  kept  the  same. 

4.  PERFORMANCE  COMPARISON  WITH  OTHER  SYSTEMS 

In  this  section  the  performance  of  the  3D  bi-orthogonally  accessed  memory  is  compared 
with  the  performance  of  the  GAMMA  ^  parallel  database  machine  and  the  projected  performance 
of  PHOEBUS  ^  an  optoelectronic  parallel  database  machine  based  on  parallel  readout  optical 
disk  technology.  These  two  systems  are  described  briefly  below. 

GAMMA  is  a  parallel  relational  database  machine  that  was  built  at  the  University  of 
Wisconsin.  It  is  based  on  a  shared-nothing  architecture  and  has  32  Intel  386  processors  each 
with  8  Mbytes  of  memory  and  its  own  330  Mbyte  disk  and  controller.  The  processors  are 
connected  with  an  Intel  iPSC/2  hypercube  network.  Data  is  distributed  across  the  disks  using 
horizontal-partitioning.  With  this  data  organization  scheme,  individual  records  are  stored 
together  in  the  same  file,  but  records  belonging  to  a  relation  are  distributed  across  all  the  disks. 
This  organization  allows  operations  to  easily  be  executed  in  parallel  on  all  32  processors.  It 
should  be  noted  that  the  benchmark  was  run  on  a  system  with  30  processors  and  disks. 

PHOEBUS  is  a  proposed  optoelectronic  database  machine  which  utilizes  parallel  readout 
optical  disks.  In  this  system  complete  records  are  arranged  on  radial  strips  on  the  surface  of  a 
Magnetooptical  optical  disk  and  are  read  out  in  parallel  one  at  a  time.  The  polarization  rotation 
of  retrieved  ONEs  and  ZEROs  from  the  memory  is  +9  degrees  and  -6  degrees  respectively.  For 
selection  with  no  indexing  which  checks  for  equality  (Only  this  particular  selection  operation  is 
described  and  supported),  records  are  retrieved  from  the  disk;  fields  other  than  the  one  involved 
in  the  selection  operation  are  filtered  out  with  a  "mask".  The  desired  field  is  then  positioned  on 
the  operand  holding  SLM.  A  ID  polarization  rotating  SLM  is  loaded  with  the  operand.  A 
ZERO  on  the  SLM  will  cause  the  polarization  of  a  bit  of  light  to  be  rotated  by  +0  degrees  and  a 
ONE  by  -0  degrees.  If  bits  on  the  SLM  agree  with  those  retrieved  from  the  memory,  the 
polarization  of  the  retrieved  bit  will  be  rotated  such  that  it  is  0  degrees.  Bits  not  agreeing  will  be 
rotated  to  +  or  -  20  degrees.  An  analyzer  is  placed  after  the  SLM  which  functions  to  filter  out 
bits  with  0  degree  polarization  rotation.  After  the  analyzer,  bits  that  are  ON  signify  bit 
differences.  The  light  at  the  output  of  the  analyzer  is  focused  onto  a  single  detector.  If  a  value 
other  than  ZERO  is  detected,  it  signifies  that  the  field  from  the  retrieved  record  did  not  equal  the 
operand. 

Table  4-4  gives  the  projected  performance  of  the  3D  two-photon  memory  based  data 
filter  for  selection  with  no  indexing  using  the  Wisconsin  benchmark  with  w  equal  to  2^  bits.  The 
time  to  perform  this  operation  with  1%  and  10%  selectivity  is  the  same  because  all  pages  and 
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blocks  are  very  likely  to  have  at  least  one  selected  record  on  them.  The  selection  times  were 
calculated  using  the  equations  given  earlier,  and  the  values  listed  in  Table  3-1. 


Field  parallel  pages  i 

Type  of  access 

Time  (in  msec) 

record  parallel 

16.45 

field-field  access 

24.19 

field-record  access 

16.92 

jacked  first: 

Type  of  access 

Time  (in  msec) 

record  parallel 

24.67 

field-field  access 

16.13 

field-record  access 

16.75 

Table  4-4:  Time  to  perform  selection  with  no  indexing  with  a  selectivity  of  1%  and  10%  using 

the  Wisconsin  benchmark,  w  equals  2^  bits. 


Table  4-5  lists  the  times  to  perform  selection  with  no  indexing  for  GAMMA  along  with 
the  projected  times  for  PHOEBUS  and  the  bi-orthogonally  accessed  two-photon  memory.  With 
the  two-photon  memory,  all  pages  have  to  be  read  out  for  this  operation  for  the  given 
selectivities.  Thus,  the  times  for  the  two-photon  memory  only  reflect  the  superior  raw  bandwidth 
and  not  the  data  isolation  features  of  the  bi-orthogonal  accessing  approach.  The  superior 
performance  is  clear.  The  performance  improvement  should  be  even  more  notable  for  operations 
utilizing  the  bi-orthogonal  accessing  capability. 


3D  two-photon  memory 


PHOEBUS 

GAMMA 
(using  30  of  its 
32  processors) 

(record  parallel 
packing) 

(field  parallel 
packing) 

memory 

device 

single  parallel 
access  optical 
disk 

30  sequential 
access  disks 

3D  2-photon 
memory 
(w  =  2^  bits) 

3D  2-photon 
memory 
(w  =  2^  bits) 

secondary 

storage 

bandwidth 

208*8/240nse 
c  =  ~  9 

Gbits/sec 

32/()  =  200 
Mbits/sec  (pg 
58)*** 

(1024  x  1024)/ 
10  usee  =  200 
Gbits/sec 

(1024  X  1024)/ 
10  usee  =  200 
Gbits/sec 

1%  of 
records 

.42  sec 

8.16  sec 

.0164  sec 

.0161  sec 

10  %  of 
records 

.42  sec 

10.82  sec 

.0164  sec 

.0161  sec 

Table  4-5:  Times  to  perform  selection  with  no  indexing  for  GAMMA,  PHOEBUS,  and  the  bi- 
orthogonally  accessed  two-photon  memory  based  data  filter  assuming  a  relation  with  10^  208 

byte  records 
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5.  FEASIBILITY  STUDY 

The  performance  potential  of  the  bi-orthogonally  accessed  3D  memory  lead  to  a  study  to 
determine  the  feasibility  of  building  an  optoelectronic  database  data  filter  to  interface  with  this 
memory.  In  the  sections  that  follow,  components  that  would  be  needed  for  a  data  filter  are 
described  in  terms  of  functionality,  speed,  packaging  requirements,  area  and  power.  Issues 
relating  to  the  implementation  of  the  actual  bi-orthogonally  accessed  3D  two-photon  memory  are 
not  discussed. 

5.1  System  description  overview 

Figure  5-19  shows  the  components  of  the  data  filter  that  were  considered  in  this  study 
with  their  assumed  data  rates  and  formats.  Figure  5-20  illustrates  details  of  the  packaging 
requirements.  It  is  assumed  that  that  w  =  8  bits;  this  choice  has  little  effect.  In  the  discussions 
that  follow,  it  is  assumed  that  fields  are  a  multiple  of  w  bits  in  size. 

The  optoelectronic  data  filter  consists  of  a  modified  CCD  array  called  a  compander 
which  is  flip-chip  bonded  onto  an  electronic  processing  chip  capable  of  performing  data  filtering 
operations.  The  number  of  bits  on  a  page  retrieved  from  the  3D  two-photon  memory,  M^,  is 
large,  approximately  10^.  The  memory  page  rate,  fopt  ,is  likely  not  to  be  that  fast.  100 
Kpages/sec  is  assumed  in  this  study.  This  is  a  slow  speed  and  a  large  amount  of  data  for  an 
electronic  processing  chip.  The  compander  functions  to  convert  the  data  retrieved  from  the 
parallel  readout  optical  memory  into  a  format  more  compatible  with  electronic  processing,  i.e.,  a 
higher  data  rate  and  reduced  parallelism. 
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parallel  processing  chip  with 
(64x8)  PEs  \ 


2-Photon  Memory 

lOOG  bits 


compander  with  (64  x64) 
super-pixels  ' 

10  usee 


record  parallel 


(1024  X  1024)  bits 
100  K  pages/sec  ^ 

FIFO  and  FIFO  control 


(64  X  64)  bits  ^ 

25.6  M  pages/sec 

580  bits 

205  M  pages/sec 


Figure  5-19:  Components  that  were  assumed  for  the  database  data  filter  with  their  data  rates. 


Figure  5-20:  back  side  of  compander  and  processing  chip  (left)  and  front  side  of  processing  chip 

(right). 


Single  compander  super-pixel  Cside  view'> 


control  circuitry 
and  drivers 


25.6  Mbits/sec 
PE  plane 


Figure  5-21 :  Side  view  of  a  compander  super-pixel 

The  compander  is  shown  in  Figure  5-19  and  Figure  5-21.  The  CCD  sensors  on  the 
compander  are  partitioned  into  (M/n)^  super-pixels.  Each  super-pixel  has  an  array  of  CCD 
sensors,  some  control  circuitry,  a  driver,  a  via  and  a  flip-chip  solder  bump.  In  operation,  the 
compander  detects  a  page  of  bits  from  the  memory.  Then  simultaneously  each  super-pixel  in 
the  compander  serially  rasters  out  its  contents  (amplified)  through  its  via  to  the  front  side  of  the 
processing  chip.  This  procedure  has  the  effect  of  converting  a  single  memory  page  of  bits 
transmitted  at  a  rate  fopt  into  n^  smaller  pages,  referred  to  as  a  compander  pages,  which  it  sends  to 
the  processing  chip  a  rate  n^fopt-  Each  compander  page  would  have  (M/n)^  pixels.  The 
bandwidth  into  and  out  of  the  compander  is  the  same,  thus  no  bottlenecks  exist;  just  the  data 
format  is  altered. 

Fabrication  limitations  constrain  the  vias  on  the  compander  to  be  at  least  500  um  apart 
This  constraint  led  to  choosing  super-pixels  with  256  CCD  sensors  (n  =  16).  There  would  be 
approximately  4,000  super-pixels.  The  resulting  compander  would  be  3.2  cm  x  3.2  cm  in  size, 
and  the  compander  page  rate  would  be  25.6  MHz. 

The  processing  chip  is  estimated  to  be  1.5  cm  x  1.5  cm  in  size  and  has  (M/n)^  solder 
bumps  to  obtain  data  from  the  compander.  (Details  of  this  chip  will  be  discussed  in  a  following 
section.)  Because  of  the  compander  and  processing  chip  size  mismatch,  the  solder  bumps  on  the 
chip  are  not  directly  underneath  the  compander  vias  that  they  are  affiliated  with.  The  back  side 
of  the  compander  chip  is  used  as  an  interconnection  plane  to  route  the  signals  from  the 
compander  vias  to  solder  bumps  on  the  chip  as  shown  in  Figure  5-20. 

The  processing  chip  itself  has  (64  x  8)  =  512  PEs  each  with  eight  solder  bumps  that 
receive  data  from  the  compander.  The  data,  after  being,  processed  by  the  chip  is  returned  to  the 
back  side  of  the  compander  through  another  set  of  solder  bumps  (~  580  of  them)  at  a  rate  of  205 
MHz.  Two  metal  layers  on  back  side  of  the  compander  would  necessitate  20  um  and  65  um 
metal  line  separation  for  the  lines  into  and  out  of  the  chip,  respectively. 


5,2  Data  format  and  the  compander 

In  operation  the  compander  de-interleaves  (in  two  dimensions)  a  data  page  from  memory. 
To  compensate  for  this,  a  word  desired  in  parallel  by  the  processing  chip  must  be  spatially 
interleaved  on  a  memory  page  with  other  words  which  would  arrive  at  the  processing  chip  in 
parallel  at  a  different  time.  Since  words  are  desired  in  parallel  from  both  orthogonal  directions. 
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of  a  interleaved  in  only  one-dimension  as  opposed  to  two,  and  interleaved  to  a  depth 


llel^ 


field  parallel 


record  interleaved  in  memory 
to  compensate  for  the  de-interleaving 
of  the  compander 


record  parallel 

Fi^re  5.22:  record  interleaved  in  memory  to  compensate  for  the  de-interleaving  of  the 

compander 

approxiZrnoZro/iZ"dZ7"Tr  ^"1  “«=- 

on  each  «e,d  parallel  Z^Zt^ZTeX  Z 

bit  word  in  parallel  belonging  to  a  unique  record  F«rl!  pp  2 

each  of  the  compander  pages  that  a  single  memnrv  receives  n  such  words  -  one  for 

words  belongs  to  a  unique  record  froml.  ^  "  converted  into.  Since  each  of  these 

field  parallel  access  are  interleaved  in  time^^im  rd"^th  'r  of  view,  fields  retrieved  with 

field  but  to  other  records  The  55  Mbit  FIFO  belonging  to  the  same 

5.3  Processing  chip  functionality 

The  processing  chip  as  mentioned  earlier  is  partitioned  into  S 1 9  ppc  p  i,  u 
comparator  and  some  memory.  The  functionalitv  of  tht.  I  T  ^ 

explaining  how  two  operations  would  be  performed  LectL^llr  described  by 

pciiormea.  selection  and  selection-projection  with  no 
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indexing  with  field-record  or  field-field  access  Th^c  u 

implement  on  the  chip.  The  other  operations  are  just  mentioned 

5.3.1  Example:  selection  and  selection-projection  with  no  indexing 


^.amshitsineachPHarewnSr^-^^^^^^^^ 

either  directly  above  or  below  is  started. 

which  o^efsi^Tsf;  r  s":;;:" 

containing  field  would  be  retrieved  Records  iLt  r 

then  be  read  out  using  field  or  record  parallel  access  ^^Psfying  the  selection  query  would 


24.  i„  .he  ■"  -<  : 

being  examined.  It  performs  the  desired  comnari*  ^  the  first  8  bits  of  the  field  that  i 

The  resull  of  the  comparison  is  stored  in  the  firsrbit“of  ite  n^bT  ", 

records  retrieved  with  field  parallel  access  are  int^rl^  !i  memoiy.  Sine 

receives  n^-l  more  bytes  (each  byte  beloneine  tn  ^ 

comparisom  with  the  same  locally  stored  operand,  The“S«Tftese  c 

in  the  remaining  n^~l  bits  of  tbp  +  +,  .  ^  of  these  comparisons  are  store( 

are  being  performed  on  the  data  from  the  **"'  described  comparison; 

daisy-chained,  with  a  siXtsvIatn  t 'T  TST  i‘ 

operands  can  be  loaded  "feht  wht  .he  fl“ 

place  when  data  from  the  next  memoiy  page  ani\er°A  slnde'd  ™"  >>0  " 

to  be  unique.  Although  not  important  fof  .hi»  ’  r  *u  ^  ^  oporand 

performed  with  record  parallel  access  After  afTth'™’ '  Pot  operations 

.ransfemedtotheoperanLegi^sTIach  PE  in  me?r^^ 

page  of  the  memory.  The  results  of  the  comnaris  °  e  compared  with  the  data  from  next 
those  from  the  previous  byte  (stored  in  the  set  second  byte  are  combined  with 

memoty.  The  proceduSlImd  f^a!  m^  "'T7^ 
select  stams  memories  reflect  which  records  in  a  sing“  Lk  hit  ^een  sdemer^ 

read  from'.hrmel"ttC.rmTFor7eir'  “’fl’t"'"'' 

organization  considerS  in  this  study  The  f 

the  order  that  they  were  stored  to  dLtly  mask  ouTfieldt“,77 

they  arrive.  If  every  record  is  selected  on  a  o'  j  belonging  to  selected  records  as 

ry  IS  selected  on  a  given  compander  page,  there  is  enough  bandwidth  to 
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-d^al.  .....  along  w.,h  «.eir  .e,ec....an.a  bi,.  ofT-cb.p  before  .be  nex.  compander  page 

record  parallel  orir^SIn  hlvt^lecled^TOXon  .hemTrlr  ‘'7™“'  ‘he 

are  no.  used  .o  indica.e  .he  exac. p^tirif  ^ ,  a  t  These bi.s 

selecion  operation  is  jus.  repea.ed^  Figure  5  23  shows  th  ^  P^*«-  ‘he 

selec.-s.ah.s  memories  and  pages  in  .het^ecord  narauTa  ^  eoirespondence  benveen  bi.s  in  the 
into  these  memories  in  .he  orteVinto^  ™«en 

mimic  this  read  out  topology.)  If  any  bit  in  fte  Wh  r^r^’f  m  “  a“Per-pixel  read  out  must  also 
is  ONE  it  would  indicate  thm  L  Icth  pane  in  fh!  a  ™™ories 

-cord  on  it.  In  an  earlier  section  irwa  ”mtno„Jd 

searching  all  blocks  in  each  super-block  usintr  field  ^  n  i  best  performed  by 

records  in  the  record  parallel  direction  Thistan  efficie  d  T"*  “'““d 

hardware.  The  n^  values  in  tite  sele  “stams  memt^Zf ““^hed 
of  information  can  be  ORed  with  results  from  the  i^^vt  blocks 

final  block  of  tite  super-block  is  betg  ^hlrd  for  tel  1 ‘^e 
are  ORed  together  to  determine  which  pages  in  Z  rSTnamile'lt memories 
As  mentioned,  the  selections  need  fo  be  a  read  out. 

retrieved  with  record  parallel  access  If  qoI  ^  ^  second  time  when  the  data  is 

much  more  memory  on  the  chip  would  be  needed  of  each  block  were  not  combined 

the  chip  area  would  grow  by  approximately  a  factor  of  four  ^  Wisconsin  benchmark 

If  it  is  determined  that  a  nave  j7tl  J  unreasonable, 

operands  can  be  ^ 

desired  comparisons  can  be  performed  Altemativel^  T*  ^  ^  ^^e 

The  masking  and  comparison  op^atLf  be  done  off-chip, 

likewise  be  performing  either  on  or  off-chip.  ^  filtering  operations  can 

5.4  Chip  area 

urn  MoSs  pZ«s"  W^'Z'sI'vlts  rc7'‘°"' ‘^“p  area  assuming  a  8 
cm.  The  asea  reguirad  for  t^k  .72  cm  t  .72 

in  this  estimate.  This  circuitry  area  estlmat/  i  ^  r  ^  ^  interconnects  is  not  included 
70%  of  the  area  for  .he.sa  ZL?.!  ? m!  .-"7  by  10/3  to  allow  a  bit  more  than 

component - la  generous  since  the  chip  ,s  essentially  a  raemoiy  chip. 

XMt  minimum  sized  register  ~  ..m  v  .  e  .Z - 

_8  bit  digital  comparator  * 

Jo^l  decoder  for  each  PE  ~^„mvao..„- - 

Spider  bump  TT  - - - - - — 

~'t  rf _ _ _ _ _ 25  urn  X  25  urn 

^  ^25  um  X  125  um 

area  estimated  from  reference  19 

Table  5-6:  Assumptions  that  were  made  to  estimate  tite  area  required  for  tite  processing  chip. 
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Chip  circuitry 
output  buffers 


Area 

=  6.3  mm  X  6.3  mm 
580(125  uni)^  =  9  1  mm2 


[j/O^soIderbun^ 
Total  chip  area 


K)00  (25  urn)  2  -  g  mm2 
-52  mm2  =  7.2  mm  x  7.2  mm 


Table  5-7:  Summaiy  of  area  required  for  processing  chip. 

5.5  Chip  power 

^^phalfcSr^T^we*^  Table  5-8  assuming 

SPICE  srurulanon.  The  eshmate  for  .he  chip  power  is  ^rZ  down  Iffo" 


P  CHIP  —  P. 


chip  elk  Pcircuitiy  +  P off-chip  drivere 


p  .  - J  '^iJ-cnip  orivers 

—  - 


Dccd 

D, _ 

(M/n)^ 

lo _ 

fiipt _ 


_Ci _ 

VpD 

_Wi _ 

Pcik  B 


p 

ofT-chip  driver 


I  linear  compander  dimension _ 

Jinear  chip  dimension 

niemory  page  size  _ 

number  of  super-pixels _ 

chip  clock  speed  _ 

memory  page  rate _ 

Compander  pa.ge  rate _ 

power  for  1  bit  register  @  205  MHz 
power  for  1  bit  register  @  25  6 
\mz 

lumber  of  output  solder  bumps" 
^205  MHz 

;ate  capacitance  ' 

letal  capacitancft _ 

ower  supply  voltage 

^idth  of  interconnects _ 

atic  power  dissipation  for  a 
inimum  sized  buffer  for 
e  clock  tree _ 

^acitance  of  a  solder  bump 
static  power  dissipation  for  a 
_sii^le  off-chip  dri ver 


value _ 

3.2  cm _ 

1.5  cm _ 

To24"  bits 
~4Kbits  ~ 
^05  MHz 
1 00  K  pages/sec 
25,6  MHz 
1  mWatt 
T/8  mWatt 


3fF 

•02  fF/fum)^ 
3.3  Volts 
3  um 

.5  m  Watts 


34.5  mWatts 


Table  5-8.  Parameters  used  to  estimate  the  power  required  for  the  processing  chip. 
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5.5.1  Power  for  chip  clock  distribution 

The  clOTk  is  assumed  to  be  distributed  with  a  H-cIock  tree.  The  equation  used  to  estimate 

dock  tr^  and  N,  ,s  number  of  gates  operating  at  205  MHz  at  each  leaf  that  receive  the  dock 

S  Ae  doctlh““ """  •“ sates  on^hip 

an?thl  third  t°  titive  each  interconnect  in  the  clock  tree^ 

It?,  ,  "  P™"'  in  'i-n  mock  tree.  The  capacitance  of  the 

clock  ITM  interconnect  was  estimated  using  reference  20.  Because  of  the  clock  speed  it  was 

prr.^n?tl  ™  ■"  "  '*8ime  and  thu^  would 

predominately  expenence  static  power  dissipation:  P.hb.  It  was  also  assumed  that  the  second 
dock  could  be  generated  from  the  first  at  the  base  of  the  clock  tree. 

Pchip  elk  =  Nieaf  (1/2  (Ng  Cg  +  Ci  Wi  Dc)  V^dD  fc  +  Pclk  b) 

where,  Njeaf  =  (M/n)^  and  Ng  =  9. 

5.5.2  Power  for  circuitry 

The  average  peak  power  for  the  circuitry  was  estimated  and  determined  to  be 

forTbriTfTj^  ft  7^"'  components  would  be  operating  at  once,  and  some  only 

for  a  bnef  penod  of  time.  This  was  taken  into  consideration. 

5.5.3  Output  driver  power 

hPlnw  ^  required  to  send  the  data  off-chip  was  estimated  with  the  equation 

below  As  with  the  clock  tree  calculation,  it  is  assumed  that  the  off-chip  drivers  and  the  buffers 

exn^r^Tnl^rt  Z  calculation  for  would  predominantely 

penence  static  power  dissipation.  The  capacitance  term  reflects  the  power  needed  to  drive  the 

wn,' mT  bumps,  respectively,  assuming  that  on  average  half  the  buffers 

would  be  switching  at  a  given  time. 

Poff-eWp  =  (1/2)  Noput  (Poff-chip  driver  +  l/2(Ci  Wj  Dccd/2  +  Cb)V^DD  fc) 

"^be  power  requirements  for  the  chip  are  summarized  in  Table  5-9. 

-  Power  (Watts) 

-Chip  circuitry _ ~10  Watts _ 

Chip  Clock _ _  ~6  Watts _ 

Power  for  chip  output  ~21  Watts 

Total  chip  power  -37  Watts 


Table  5-9:  Summary  of  estimate  for  the  power  for  the  processing  chip. 

5.6  Compander  power 

Table  5-10  lists  the  parameters  that  were  used  in  the  power  calculation  for  the  compander 
abng  with  their  assumed  values.  The  total  power  required  for  the  compander,  P,„,p,  is  estimated 
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Pcmp  PcCD  elk  Pemp  ctl  P drivers 

PccD  elk  is  the  power  for  the  clock  on  the  CCD;  Pemp  cti  is  the  control  power  for  the  CCD 
which  causes  the  pixels  in  each  super-pixel  to  be  read  out  in  the  correct  order;  Pdrivers  is  the 
electrical  power  need  to  amplify  the  detected  signals  and  to  send  them  to  the  processing  chip. 
The  compander  was  assumed  to  have  three  phased  clocking. _ 


|I88R!!KIRI 

descriotion 

value 

T1 

quantum  efficiency  of  CCD 
sensors 

50% 

Dccd 

linear  CCD  dimension 

3.25  cm 

CgCCD 

capacitance  for  single  CCD 
pixel 

10  fF 

Wj 

width  of  interconnect  lines 

3  um 

PcCDctl 

electrical  control  power  for 
each  super-pixel 

negligible 

dvia 

distance  between  vias  on 
Compander 

500  um 

compander  via  capacitance 

1  pF 

Cb 

solder  bump  capacitance 

1  pF 

Pc  clkB 

static  power  dissipation  of 
buffer  in  clock  tree 

.5  (1/8)  mWatts 

P 

amp 

static  power  dissipation  of 
amplifier  in  each  super-pixel 

.  1  mWatts 

Ci 

metal  capacitance 

.02  fF/(um)2 

Vdd 

power  supply  voltage  for 
compander 

3.3  Volts 

wavelength  of  detected  light 

640  nm 

PoDt 

optical  power  per  single  bit 

10  nWatts  12*1 

Table  5-10:  Parameters  used  to  estimate  the  power  for  the  compander. 

5.6.1  Power  for  compander  control  circuitry 

The  power  for  required  for  the  total  control  circuitry  was  determined  using  the  power  for 
a  single  bit  register  operating  at  n  fopt.  Because  of  the  slow  operational  speed,  P^mp  d  was  found 
to  be  very  small  and  was  neglected. 

5.6.2  Power  for  compander  clock  distribution 

The  clocks  signals  on  the  compander  were  also  assumed  to  be  distributed  with  an  H-tree. 
There  are  two  three  phased  clock  systems:  a  slow  clock  and  a  fast  clock.  The  slow  clock 
operates  at  a  rate:  nfopt  and  is  responsible  for  shifting  the  “rows”  of  each  super-pixel  into  faster  n 
bit  CCD  arrays  for  eventual  output.  The  faster  CCD  array  operates  at  the  compander  page  rate. 
All  CCD  sensors  are  driven  by  the  slower  clock.  MVn  additional  CCD  sensors  are  driven  by 
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the  faster  clock.  It  is  assumed  that  one  of  the  three  clocks  in  the  three  phase  clocking  scheme  can 
be  generated  at  the  leaf  nodes  for  both  the  slow  and  fast  clocking  systems.  The  equation  for  the 
power  required  for  clock  distribution  is  thus: 

PcCDcIk  “  (N|eaf  fast  +  Nieaf  slow/lt)  (PcClks  +  3/2  (CgCCD  (2/3)  C;  \^{  Dccd)  V  DD  fcmp) 
where,  Nuaffast  =  MV(3n)  and  Nkafsiow  =  MV3. 


5.6.3  Power  to  send  data  to  the  processing  chip 

It  was  first  verified  that  the  optical  power  per  bit  would  generate  enough  electrons  to  be 
reasonably  detected.  With  the  variables  in  Table  5-10,  the  number  electrons  that  are  generated 
per  bit  can  be  expressed  as: 

Ne  =  tlPopt/(fopthv))  =160K 


this  corresponds  to  the  following  voltage  on  each  CCD  gate: 


'^detection  (1.6X10  Ne)/CgCCD  =  2.56  VoltS 


This  voltage  could  easily  be  amplified  to  the  rail  voltage.  The  power  needed  to  amplify 
the  detected  signals  and  send  them  off  chip  is  estimated  with  the  equation  below.  The  first  term 
reflects  the  static  power  dissipation  of  the  amplifier.  The  second  term  accounts  for  the  power 
needed  to  drive  the  vias,  the  solder  bumps  and  the  lines  routing  the  data  from  the  compander  to 
the  processing  chip. 


^drivers  (M/n)  Psingle  driver 


where,  Psi„gie  driver  =  (M/n)  (  Pamp  +  1/2  (Cvia  +  Cb  +  (1/2)  Ci  Wi  Dccd)  V^dd  fcmp) 

The  power  breakdown  for  the  compander  is  summarized  in  Table  5-11.  The  power 


Power  (Watts) 

Compander  Clock 

~9  Watts 

Power  to  amplify  signal  and  send 
data  to  chip 

~2  Watts 

Total  compander  power 

~1 1  Watts 

Table  5-1 1 :  Summary  of  power  required  for  the  compander. 


6.  CONCLUSIONS 

The  3D  two-photon  memory,  in  general,  was  found  to  be  well  suited  for  very  large 
databases  with  low  write  requirements  because  of  its  potential  capacity,  raw  bandwidth  and 
random  access  capability.  It  was  determined  that  performance  can  be  significantly  improved  by 
utilizing  an  accessing  feature  termed  bi-orthogonal  access,  a  feature  which  allows  both  record 
and  field  parallel  access  to  be  supported  simultaneously.  It  was  noticed  that  the  way  that  records 
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are  placed  in  the  memory  affects  performance.  No  particular  record  placement/packing  strategy 
was  found  to  be  clearly  advantageous  for  all  operations.  In  a  system  this  choice  would  have  to 
be  made  by  anticipating  frequent  operations.  For  the  relation  size  that  was  considered  in  this 
study  (10^  records)  the  packing  strategy  was  found  to  have  greater  impact  than  the  effect  of 
memory  fragmentation.  This  effect  of  packing  would  reduce  with  larger  relation  sizes,  but 
would  increase  with  smaller  sized  relations. 

A  metric  termed  bit  retrieval  efficiency  was  devised  to  evaluate  the  optimality  of 
relational  database  operations  performed  with  a  bi-orthogonally  accessed  3D  optical  memory. 
The  results  of  this  analysis  indicate  that  the  throughput  and  the  random  access  capability  of  a 
memory  both  affect  the  performance  of  relational  database  machines.  A  3D  two-photon  memory 
with  bi-orthogonal  access  can  viewed  as  having  improved  random  access,  and  in  this  way 
improved  performance.  A  shortcoming  that  all  massively  parallel  access  optical  memories  have, 
in  terms  of  this  application,  is  that  an  entire  page  of  records  has  to  be  retrieved  even  if  only  one 
record  is  desired.  From  this  vantage  they  appear  to  have  reduced  random  access  capabilities.  A 
bi-orthogonally  accessed  3D  optical  memory  with  the  same  bandwidth  but  with  a  reduced  page 
size  would  have  improved  performance. 
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